Skip to content

Fix 1208#1218

Merged
HannoSpreeuw merged 17 commits into
amusecode:mainfrom
HannoSpreeuw:Fix_1208
Apr 28, 2026
Merged

Fix 1208#1218
HannoSpreeuw merged 17 commits into
amusecode:mainfrom
HannoSpreeuw:Fix_1208

Conversation

@HannoSpreeuw

Copy link
Copy Markdown
Collaborator

Trying to fix #1208, but not there yet, i.e. this will compile but not display a meaningful error message like "Hey, you have inserted too many stars, with respect to the Sapporo Light limit of $2^{17}=131,072$".

More specifically, Erwan's script will run flawlessly with $<2^{17}$ stars, but yield

amuse.support.exceptions.CodeException: Exception when calling function 'commit_particles', of code 'ph4Interface', exception was 'Error in code: no error message - code probably died, sorry.'

with $2^{17} + 1$ stars.

This MR aims to propagate nj_max to "higher levels", i.e. to PH4 such that the controlling Python script - Erwan's script - can access it.

Adding a line print(f"{code.get_nj_max() = }, ") to that script should yield $131072$ but instead yields

TypeError: ph4Interface.get_nj_max() takes 0 positional arguments but 1 was given

I have tried to work around this, but I have not succeeded.

Perhaps @LourensVeen could shed some light on this?

The goal here is to make "nj_max" accessible at the level of the controlling Python script.
The approach is to add a Sapporo Light function that will return "nj_max", next add a PH4 function that can call this function and, subsequentlty, provide a Python interface to this function, i.e. a layered approach.
According to the AMUSE guideline https://amuse.readthedocs.io/en/latest/tutorial/legacy_code.html we should be able to leave "src/amuse_ph4/interface.cc" untouched, i.e. exactly the same as in the "main" branch.
I.e., this code should not be necessary.
In "sapporoG6lib.cpp" we can do "return grav.get_nj_max()" since that has "sapporo grav;".
This compiles, i.e.
"./setup install sapporo_light" and "./setup install amuse-ph4" will complete without error.
However, Erwan's script with an additional "print(f"{code.get_nj_max() = }, ")" will yield
"
TypeError: ph4Interface.get_nj_max() takes 0 positional arguments but 1 was given
"
@HannoSpreeuw HannoSpreeuw requested a review from LourensVeen March 6, 2026 10:11
@HannoSpreeuw HannoSpreeuw self-assigned this Mar 6, 2026
@HannoSpreeuw HannoSpreeuw requested a review from a team as a code owner March 6, 2026 10:11
@HannoSpreeuw HannoSpreeuw added kind: feature request The issue requests a feature that AMUSE does not currently have status: help wanted Help is needed or at least wanted from someone with more expertise in the area priority: low Issues that do not need to be fixed quickly, or perhaps at all labels Mar 6, 2026

@LourensVeen LourensVeen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should fix the 0 positional argoments error at least.

Comment thread src/amuse_ph4/interface.py
@github-project-automation github-project-automation Bot moved this from Backlog to Open PRs in Development Board Mar 6, 2026
This function definition does not need to be separated.
1) 'extern "C"' not needed here
2) We need to adhere to AMUSE standards, I guess the legacy decorator implies that we need to set an argument 'int * value' and '*value = jd->get_nj_max(); return 0;' or we will be returned an empty list instead of an integer.
3) As a consequence of 2) we need to add an argument to the function: 'function.addParameter(...'.
"interface.cc" should have been added as part of the previous commit.
Essential is the '#ifdef GPU', since, when one executes "./setup install amuse-ph4" one will not link with Sapporo as one would when executing "./setup install amuse-ph4-sapporo".
So "GPU" and "Sapporo" go hand in hand and one needs to make sure that "./setup install amuse-ph4" will complete without error when no GPU is available.
@HannoSpreeuw

Copy link
Copy Markdown
Collaborator Author

@LourensVeen with these recent commits, this will build and run, i,.e. if one adds a line

    print(f"{code.get_nj_max() = }, ")

to Erwan's script after the line

    channel_a.copy()

it will print 131072, which means that the user gets to know the maximum number of stars that Sapporo Light allows for and adjust his/her Python script accordingly.

Closes #1208

@HannoSpreeuw

Copy link
Copy Markdown
Collaborator Author

Closes #1208

In the sense that the user has access to nj_max.

But when running a simulation with > nj_max stars, you still do not get a meaningful error message.

@LourensVeen LourensVeen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I have a few small changes to request. Thanks!

Comment thread lib/sapporo_light/sapporo.h Outdated
Comment thread lib/sapporo_light/sapporo.h Outdated
Comment thread src/amuse_ph4/src/jdata.cc Outdated
Following@LourensVeen recommendation to prevent things from going wrong when there's a header include loop.
Following @LourensVeen's recommendation: When running PH4 simulations on a CPU instead of a GPU, Sapporo Light's "nj_max" does not play a role, so one will want to return a different very large number indicating the maximum number of stars allowed for the simulation. In case the number of stars in a simulation running on a CPU will approach "nj_max = std::numeric_limits<int>::max();" one will probably run into a memory error, but that's another matter.
This print statement should be displayed in the terminal where the controlling Python script is executed if "redirection=none" has been added as an argument to, e.g., ph4.
1) An error in deploying Sapporo Light only occurs if "nj > nj_max", i.e. strictly larger.
2) Extra blank line.
@HannoSpreeuw

Copy link
Copy Markdown
Collaborator Author

But when running a simulation with > nj_max stars, you still do not get a meaningful error message.

This has now been fixed, i.e. if the Python script has a line

    code = Ph4(converter, redirection="file", mode="gpu")

then there will be a line

ERROR: nj = 131073 exceeds GPU capacity nj_max = 131072

in the code.out output file.

@LourensVeen LourensVeen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment but then this is ready to merge.

Comment thread lib/sapporo_light/sapporo.h Outdated
Addressing comment amusecode#1218 (comment) from @LourensVeen:
"
No need for #ifdef __cplusplus here, this file has a class in it so it must always be compiled with a C++ compiler, a C compiler is going to error anyway.
"
HannoSpreeuw and others added 2 commits April 20, 2026 16:56
"test7" is a unit test asserting that two parallel calls to a one second sleep should together takes less than two seconds. Apparently, this frequently fails on MacOS runners (@LourensVeen, private communication). Therefore, it seems better to bypass this test for that platform.
@HannoSpreeuw

Copy link
Copy Markdown
Collaborator Author

Codacy Static Code Analysis is complaining about an unused function, i.e. the abort in line 71 of send_fetch_data.cpp

    if (nj > nj_max) {
      fprintf(stdout, "ERROR: nj = %d exceeds GPU capacity nj_max = %d\n", nj, nj_max);
      abort;
    }

but that complaint does not make sense to me, nor does the other complaint - Error prone Method 'get_nj_max' has no argument.

@LourensVeen LourensVeen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, let's merge this!

@LourensVeen

Copy link
Copy Markdown
Member

I think the unused function warning Codacy givesmay be about the nj variable, and that it wants you to write if (address_j.size() > nj_max) and avoid the extra variable.

I'm not sure though, and this is reported so spectacularly poorly that I suggest we leave it as is if only out of spite.

@github-project-automation github-project-automation Bot moved this from Open PRs to Done in Development Board Apr 28, 2026
@HannoSpreeuw

Copy link
Copy Markdown
Collaborator Author

I think the unused function warning Codacy givesmay be about the nj variable, and that it wants you to write if (address_j.size() > nj_max) and avoid the extra variable.

Okay, but

    int nj = address_j.size();

is not part of this MR and nj is needed below.

@HannoSpreeuw HannoSpreeuw merged commit f3149e6 into amusecode:main Apr 28, 2026
3 of 4 checks passed
@HannoSpreeuw HannoSpreeuw deleted the Fix_1208 branch April 28, 2026 11:10
@ErwanH29

Copy link
Copy Markdown
Collaborator

Hey Hanno! I appreciate you working on this, but I am not sure if it is completely fixed.
I recently rebuilt the package and when I went above the particle limit, the code crashed giving the same error message as #1208. I went on to manually increase the particle number, and still the same outcome happened.

After several trials, I built from commit 8dd292c and applied your original fix manually by changing nj_max. With that version, the code worked.

So it looks like something introduced after that commit may have broken or interfered with the fix.

@HannoSpreeuw

Copy link
Copy Markdown
Collaborator Author

I recently rebuilt the package and when I went above the particle limit, the code crashed giving the same error message as #1208.

And it should, but you should now be able to find a meaningful error message in your output file if you set redirection="file".

After several trials, I built from commit 8dd292c and applied your original fix manually by changing nj_max. With that version, the code worked.

I guess by original fix you mean manually increasing nj_max = 131072;?
It is always an option to set nj_max to the number of stars you need and recompile, but as far as I recall, it was not the idea of this MR to set nj_max to a number much larger than $2^{17}$, since that would just shift the problem.
However, please let me know if you want a different default value, that is obviously super easy to implement.

Ideally, one would like to have nj_max set through the controlling Python script, but that would require a major code overhaul; currently nj_max is set at compile time, if you would want to set that at run time, this would involve quite some effort.

@ErwanH29

Copy link
Copy Markdown
Collaborator

Oh, I probably should have read this more carefully. My bad. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind: feature request The issue requests a feature that AMUSE does not currently have priority: low Issues that do not need to be fixed quickly, or perhaps at all status: help wanted Help is needed or at least wanted from someone with more expertise in the area

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Ph4 gpu crashes when the number of particles exceeds 2^17

3 participants